138 research outputs found

    Building a free French wordnet from multilingual resources

    Get PDF
    International audienceThis paper describes automatic construction a freely-available wordnet for French (WOLF) based on Princeton WordNet (PWN) by using various multilingual resources. Polysemous words were dealt with an approach in which a parallel corpus for five languages was word-aligned and the extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from Wikipedia and thesauri. The results obtained from each resource were merged and ranked according to the number of resources yielding the same literal. Automatic evaluation of the merged wordnet was performed with the French WordNet (FREWN). Manual evaluation was also carried out on a sample of the generated synsets. Precision shows that the presented approach has proved to be very promising and applications to use the created wordnet are already intended

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    Fear and Loathing on Twitter: Attitudes towards Language

    Get PDF
    The paper deals with the sociolinguistic concept of prestige imbued in the notion of standard language, and the social status connected to the inherent language skill (or lack thereof). To this end, we analyse Slovenian tweets pertaining to language use and the (in-)correctness of other users' use of language, propose a typology, especially in cases where language use is used as an argument against someone's qualifications or beliefs

    Konferenca Slovenščina na spletu in v novih medijih

    Get PDF
    Od 25. do 27. novembra 2015 je v dvorani GIAM ZRC SAZU v Ljubljani potekala znanstvena konferenca Slovenščina na spletu in v novih medijih. Konferenco so v okviru temeljnega raziskovalnega projekta JANES, ki ga med letoma 2014 in 2017 financira Javna agencija za raziskovalno dejavnost Republike Slovenije, soorganizirali Filozofska fakulteta Univerze v Ljubljani, Slovensko društvo za jezikovne tehnologije, slovenska raziskovalna infrastruktura za jezikovne vire in tehnologije CLARIN.SI in regionalna iniciativa za jezikovne podatke RelDI. Prvi dan konference je bil namenjen celodnevnemu seminarju iz statistike za jezikoslovce, ki ga je vodila doc. dr. Maja Miličević z Univerze v Beogradu. 25 udeležencev se je seznanilo z osnovami kvantitativnih metod v korpusnem jezikoslovju, opisno in inferenčno statistiko, prav tako pa tudi z načini vizualizacije jezikovnih podatkov in programskega paketa R. Gradivo s seminarja je dostopno na konferenčni spletni strani

    Slovenščina 2.0: »Računalniško posredovana komunikacija«

    Get PDF
    Slovenščina 2.0: »Računalniško posredovana komunikacija

    Raziskovalni tabor spletne slovenščine za srednješolce JANES

    Get PDF
    Od 24. do 28. avgusta 2015 je na Oddelku za prevajalstvo Filozofske fakultete Univerze v Ljubljani potekal Raziskovalni tabor spletne slovenščine za srednješolce JANES. Tabor je bil organiziran v okviru temeljnega nacionalnega projekta JANES – Jezikoslovna analiza nestandardne slovenščine je nacionalni raziskovalni projekt (J6―6842), ki ga od 1. 7. 2014 do 30. 6. 2017 financira Javna agencija za raziskovalno dejavnost Republike Slovenije, s sredstvi razpisa za predstavljanje, uveljavljanje in razvoj slovenskega jezika (JPR-UPRS-2015) pa ga je sofinanciralo Ministrstvo za kulturo

    Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings

    Full text link
    The notions of concreteness and imageability, traditionally important in psycholinguistics, are gaining significance in semantic-oriented natural language processing tasks. In this paper we investigate the predictability of these two concepts via supervised learning, using word embeddings as explanatory variables. We perform predictions both within and across languages by exploiting collections of cross-lingual embeddings aligned to a single vector space. We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20% in correlation when predicting across languages. We further show that the cross-lingual transfer via word embeddings is more efficient than the simple transfer via bilingual dictionaries

    Building a free French wordnet from multilingual resources

    Get PDF
    International audienceThis paper describes automatic construction a freely-available wordnet for French (WOLF) based on Princeton WordNet (PWN) by using various multilingual resources. Polysemous words were dealt with an approach in which a parallel corpus for five languages was word-aligned and the extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from Wikipedia and thesauri. The results obtained from each resource were merged and ranked according to the number of resources yielding the same literal. Automatic evaluation of the merged wordnet was performed with the French WordNet (FREWN). Manual evaluation was also carried out on a sample of the generated synsets. Precision shows that the presented approach has proved to be very promising and applications to use the created wordnet are already intended
    corecore